library(tidyverse)
library(tidycensus)
library(tigris)
library(mapview)
library(sf)
library(units)Merging Blocks and TAZs
TODO:
- Remove duplicate rooftops: `rooftops |> st_drop_geometry() |> group_by(STRUCT_ID) |> tally() |> filter(n>1) |> View()` Example: STRUCT_ID == “250858_862528”. There are also STRUCT_IDs with values of “NA” that we should explore. Are they duplicate? Are they just unlabeled?
- Remove vestigial commented out code. It’s going to be confusing.
- Export only the allocation table: GEOID, TAZ, share_of_geoid_to_TAZ_method1 (then repeat for methods 2 and 3). Once you’re happy you can join them together to compare. In the end, we’ll keep one method. We also want to be able to attach the dasymetric method later too.
- Check the all area calc to see if the coast line is included. We really dont need that no matter what. It may just be a not great method and we dont worry much about it.
- Check the number of CBG to TAZ relationships that exist. They ought to be the same(?) if we include 0% shares, which might not appear in a table.
- Look at building footprints dataset for whole country by Microsoft- Emily
Notes:
- You don’t have to wrap select() arguments in c().
- Using tidycensus gives you a cleaner table to start with rather than grabbing a flat file from the census and storing in in a folder. The end result should be the same.
- You could do all the land area/rooftop/water area joins in one steps if you calc it at the start then keep track of it all. alternatively we can just keep this as is, then create a fresh notebook with the chosen solution.
General Goal
The TDM outputs results at a TAZ level. However, for equity analyses, we need to allocated census demographics to those TAZs–the model does not output results based on minority status, income status, or any other status that might be interesting for an equity analysis.
Census geometry does NOT fit neatly into TAZ geometry. We would like to allocate pieces of census demographics to surrounding TAZs to create demographics of the TAZs. We need to create a table that we can use to allocate portions of census geometry into TAZs.
Problems:
- TAZs are mostly made up of blocks, but they’re not exactly made up of blocks.
- There are small tweaks to the TAZ geometry throughout the model region that mean lines do not overlap exactly.
- Census blocks are NOT necessarily the best starting point because the differential privacy features of the 2020 census have rendered them unreliable.
- TAZs may be smaller than a census geometry.
Constraints:
TAZ geometry may change over time. The process should be repeatable for arbitrary TAZ geometry (this essentially means for ANY geometry).
We want the process to be maintainable over multiple ACS iterations.
Other Goals
- Create multiple methods to compare the sensitivity to methodology.
Potential Solutions
When joining 2010 census blocks to TAZs, Paul Reim performed a series of intersections to identify where blocks were split. Where a TAZ split a block, he counted rooftops and used StreetView to estimate how many rooftops were in each TAZ. It is not desirable to maintain such a process–the reproducibility is limited and its highly manual. We can replicate the methodology using the rooftops layer. There are other solutions available as well:
- Rooftops
- Allocate based on where the largest piece of the rooftop is
- Where the centroid of the roof is
- Allocate proportionally based on the rooftop area.
- Land Area
- Total Area
- Dasymetric Mapping
Useful references
massgis census: https://www.mass.gov/info-details/massgis-data-2020-us-census
MassDOT rooftops: https://www.mass.gov/info-details/massgis-data-building-structures-2-d
Set up Packages
Load Census, TAZ, and Rooftop Shapes
#Load TAZ shapes
TAZ <- sf::read_sf("J:/Shared drives/Data_Requests/TAZ_Shapefiles/R_Script/out/geopackage/TAZ19.gpkg") %>%
st_transform(26986) %>%
filter(STATE == "MA") # Filtering to only in MA
TAZ$TAZ_area <- st_area(TAZ)
#filter(ID == 4398) # Filtering for one example
# OBtain the 2020 block_groups from the tigris package.
# The CB false (the more detailed) geometry looked like it matched up better, so we move forward with that path.
block_groups <- tigris::block_groups(state = "MA",
cb = FALSE,
year = 2020) %>%
st_transform(26986)
# Clip the block_groups to match the geometry of the TAZ file. This will eliminate coastal waters but retain inland waters. This is a useful starting place to have a similar coast line. This will also make our `erase_water` function work on a more similar set of water.
block_groups_nocoast <-
st_intersection(block_groups, TAZ |> summarize()) |>
st_collection_extract("POLYGON")Warning: attribute variables are assumed to be spatially constant throughout all
geometries
# Review the differences. It looks like what we'd expect. Coastal waters are not in the dataset anymore.
# mapview(block_groups) +
# mapview(block_groups_nocoast , col.regions = "orange")
# Overwrite the original file with the no_coast version and clean up the workspace.
block_groups <- block_groups_nocoast
rm(block_groups_nocoast)
#massgis rooftops: https://www.mass.gov/info-details/massgis-data-building-structures-2-d
# Download the file into "./data/base/" and then read it in here. Save it as a .rds for easy loading. Optionally, we could
# rooftops <- st_read("data/base/STRUCTURES_POLY.shp")
#
# saveRDS(rooftops, "data/processed/rooftops.rds")
# Load Population Census Data
# downloaded from: https://data.census.gov/table?t=Race+and+Ethnicity&g=0400000US25$1500000&tid=DECENNIALPL2020.P1
minority <- read_csv("data/base/DECENNIALPL2020.P1-Data.csv")New names:
Rows: 5117 Columns: 145
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(144): GEO_ID, NAME, P1_001N, P1_001NA, P1_002N, P1_002NA, P1_003N, P1_0... lgl
(1): ...145
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...145`
# P2_005N: Non-Hispanic/Latino, White-alone.
# P1_001N: Total Population.
minority_tidy <- tidycensus::get_decennial(
geography = "block group",
state = "MA",
variable = "P2_005N",
summary_var = "P1_001N",
year = 2020)Getting data from the 2020 decennial Census
Using the PL 94-171 Redistricting Data summary file
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
Only Looking at Plymouth County
We start by filtering out for just Plymouth County to keep the problem small while we work through the process.
# ply = Plymouth.
TAZ_ply <- TAZ %>%
filter(COUNTY == "Plymouth MA")
# 023 = Plymouth
# See: tidycensus::fips_codes |> filter(state == "MA", county_code == "023")
blk_group_ply <- block_groups %>%
filter(COUNTYFP == "023") %>%
group_by(GEOID) %>%
summarize()Reordering The Steps
Find Area of census block groups
blk_group_ply$BG_TOTAL_AREA <- (st_area(blk_group_ply) ) Intersect census block groups with TAZ
intersections <- st_intersection(TAZ_ply, blk_group_ply) %>%
st_collection_extract("POLYGON") %>%
rowid_to_column("Intersection_ID") # unique ID for each intersection geometryWarning: attribute variables are assumed to be spatially constant throughout all
geometries
intersections$INTERSECTION_TOTAL_AREA <- st_area(intersections)Erase Water From Intersections
intersections_land <- tigris::erase_water(intersections,
area_threshold = 0.5,
year = 2020)Fetching area water data for your dataset's location...
Erasing water area...
If this is slow, try a larger area threshold value.
Recalculate Land Area for the Intersections
intersections_land$INTERSECTION_LAND_AREA <- st_area(intersections_land)Find Total Land Area for Census Block Groups
intersections_land <- intersections_land %>%
group_by(GEOID) %>% #grouping by block group
mutate(BG_LAND_AREA = sum(INTERSECTION_LAND_AREA))Example
example <- intersections_land %>%
filter(ID == 1759)
mapview(TAZ_ply |>
filter(ID %in% example$ID),
col.regions = "red",
layer.name = "TAZ") +
mapview(blk_group_ply |>
filter(GEOID %in% example$GEOID),
alpha.regions = 0.01, lwd = 3,
layer.name = "CBG") +
mapview(example, zcol = "Intersection_ID",
layer.name = "Intersection")Allocate based on rooftops
Determine the Number of Rooftops in Each Intersection Area
We pull the pre-stored rooftops file in and show an example. Feel free to see how the file matches the footprints in the OSM layer and the aerial satellite layer.
# rooftops.rds reated previously.
rooftops <- read_rds("./data/processed/rooftops.rds")
mapview(rooftops |> filter(STRUCT_ID == "53226_895084"))Looking at repeated and NA IDs for rooftops
# Find the IDs of repeats and how many times they are repeated
repeats <- rooftops |> st_drop_geometry() |> group_by(STRUCT_ID) |> tally() |> filter(n>1)
View(repeats)
# Find the rooftops with unique STRUCT_IDs
unique_roof <- rooftops %>% filter(!(STRUCT_ID %in% repeats$STRUCT_ID)) %>%
select(STRUCT_ID, geometry)
# Find the rooftops with repeated or NA STRUCT_IDs
repeat_roof <- rooftops %>% filter(STRUCT_ID %in% repeats$STRUCT_ID) %>%
select(STRUCT_ID, geometry)
# All NAs will be replaced by a new id
repeat_roof <- repeat_roof |>
mutate(STRUCT_ID = coalesce(STRUCT_ID, paste0(row_number(),"_replaced_na") ) )
# Summarizing to get new geometry
repeat_roof <- repeat_roof %>%
group_by(STRUCT_ID) %>%
summarize()
mapview(repeat_roof)# combining the rooftops with unique ids with new summarized roof geometries
final_roofs <- bind_rows(unique_roof, repeat_roof)
# Some of the repeats are where there are buildings for schools, like UMass Boston
# Others are where there are multiple polygons that have the same struct ID and just look like a main building with an extra attachment.
# Option 1: summarize repeats and add back into no duplicates
# Option 2: Go through manually and take the largest
#NA_rooftops <- rooftops %>%
# filter(is.na(STRUCT_ID) == TRUE)
#x <- head(NA_rooftops,1)
#mapview(x)
# NA struct IDs seem to be randomWe now find which rooftops are in which intersection area.
#intersecting the rooftops with the census block groups
#roofs_in_bg <- st_intersection(blk_groups_land, rooftops)
# finding the number of rooftops in each census block group
# num_rooftops <- roofs_in_bg %>%
# group_by(GEOID) %>%
# mutate(num_roof = n()) %>%
# distinct(GEOID, num_roof)
# saveRDS(roofs_in_bg, "data/processed/rooftops_in_bg.rds")
# saveRDS(num_rooftops, "data/processed/num_rooftops.rds")
intersections_land <- intersections_land %>%
select(Intersection_ID, ID, GEOID, BG_TOTAL_AREA, INTERSECTION_TOTAL_AREA, INTERSECTION_LAND_AREA, BG_LAND_AREA)
# Joining the intersection areas with the rooftops that reside inside of them.
#
# See this thread for the basic idea. The default join for
# st_join is st_intersects, this is basically intersect the two and choose pull the info of the largest intersection area into the rooftop dataset.
# https://github.com/r-spatial/sf/issues/578
# NOTE: have to rethink the join to account for total bg area as opposed to the largest intersection area (see Jamboard)
# https://jamboard.google.com/d/1L5YrUrmOUpmQp8X6w2gaz51Fz6We8b0uV-Hm5g4aU2E/viewer?ts=63d15e99&f=1
# This took a very long time
roofs_in_intersect <- st_join(
final_roofs,
intersections_land,
left= FALSE, # inner join
largest = TRUE)Warning: attribute variables are assumed to be spatially constant throughout all
geometries
# Here's an example of what this is doing.
roofs_in_intersect_filt <- roofs_in_intersect |>
filter(ID %in% c(2708, 2709)) |>
mutate(Intersection_ID = as.character(Intersection_ID))
mapview(roofs_in_intersect_filt, zcol = "Intersection_ID") +
mapview(intersections |>
filter(ID %in% roofs_in_intersect_filt$ID),
alpha.region = 0.01, layer.name = "Intersection")num_rooftops <- roofs_in_intersect %>%
st_drop_geometry() %>%
group_by(Intersection_ID) %>%
mutate(num_roof = n()) %>%
distinct(Intersection_ID, ID, GEOID, BG_TOTAL_AREA, BG_LAND_AREA, INTERSECTION_TOTAL_AREA, INTERSECTION_LAND_AREA, num_roof)
saveRDS(roofs_in_intersect, "data/processed/rooftops_in_intersect.rds")
saveRDS(num_rooftops, "data/processed/num_rooftops_intersect.rds")Looking to see how many rooftops are in multiple intersections
roofs_in_intersect2 <- st_join(
final_roofs,
intersections_land,
left= FALSE) %>%
group_by(STRUCT_ID) %>%
mutate(num_intersections = n())
roofs_in_intersect2 <- roofs_in_intersect2 %>%
distinct(STRUCT_ID, num_intersections) %>%
filter(num_intersections > 1)
# 4 intersections: 6 rooftops
# 3 intersections: 133 rooftops
# 2 intersections: 90 rooftops
# Percent of rootops (240492) that are in 3+ intersections: 0.058%Determine the percentage of population that belongs to each intersection in the TAZ based on the number of rooftops
# Pull in the rooftops that were calculated in the last step.
roofs_in_intersect <- read_rds("./data/processed/rooftops_in_intersect.rds")
num_rooftops <- read_rds("data/processed/num_rooftops_intersect.rds")
# Calculate the share of the GEOID in each TAZ.
rooftop_share <- num_rooftops |>
group_by(GEOID) |>
mutate(rooftop_pct = num_roof/sum(num_roof))
write_csv(rooftop_share, "./output/data/rooftop_share_plymouth.csv")# Get population for block groups
census_pop <- minority_tidy %>%
select(c(GEOID, summary_value)) %>%
rename(population_BG = summary_value)
census_pop$GEOID <-str_remove_all(census_pop$GEOID, "1500000US")
pop_percent <- rooftop_share %>%
left_join(census_pop) %>%
# per BG: get population in each intersecting TAZ
mutate(population_intersection = rooftop_pct * as.numeric(population_BG)) %>%
group_by(ID) %>%
# per TAZ: add up the population in each intersecting BG
mutate(population_TAZ_roof = sum(population_intersection)) %>%
ungroup()Joining, by = "GEOID"
TAZ_pop_rooftops <- pop_percent %>%
distinct(ID, population_TAZ_roof)
TAZ_plot <- TAZ_ply %>%
left_join(TAZ_pop_rooftops)Joining, by = "ID"
mapview(TAZ_plot, zcol = "population_TAZ_roof")Allocate based on area
Land Area only
This performs an intersection and allocation between CBGs and TAZs based on land area (major water features are removed).
# Create the land area allocation table for export.
land_area_alloc <- intersections_land %>%
group_by(GEOID) |>
# per BG: get percentage of GEOID land area that is in each intersection
mutate(BG_LA_pct = drop_units(INTERSECTION_LAND_AREA / BG_LAND_AREA))
write_csv(land_area_alloc, "./output/data/landarea_share_plymouth.csv")
land_areas <- land_area_alloc %>%
left_join(census_pop) %>%
# per BG: get population in each intersecting TAZ
mutate(population_intersection = BG_LA_pct * as.numeric(population_BG)) %>%
group_by(ID) %>%
# per TAZ: add up the population in each intersecting BG
mutate(population_TAZ_LA = sum(population_intersection)) %>%
ungroup()Joining, by = "GEOID"
TAZ_pop_LandArea <- land_areas %>%
distinct(ID, population_TAZ_LA)
TAZ_plot <- TAZ_plot %>%
left_join(TAZ_pop_LandArea)Joining, by = "ID"
mapview(TAZ_plot, zcol = "population_TAZ_LA")Total Area
This performs an intersection and allocation between CBGs and TAZs based on all area, including water area.
# Create the full area (land + water) allocation table for export.
tot_area_alloc <- intersections_land %>%
group_by(GEOID) |>
# per BG: get percentage of all area that is in each intersecting TAZ
mutate(BG_AA_pct = drop_units(INTERSECTION_TOTAL_AREA / BG_TOTAL_AREA))
write_csv(tot_area_alloc, "./output/data/allarea_share_plymouth.csv")
tot_areas <- tot_area_alloc %>%
left_join(census_pop) %>%
# per BG: get population in each intersecting TAZ
mutate(population_intersection = BG_AA_pct * as.numeric(population_BG)) %>%
group_by(ID) %>%
# per TAZ: add up the population in each intersecting BG
mutate(population_TAZ_AA = sum(population_intersection)) %>%
ungroup()Joining, by = "GEOID"
TAZ_pop_TotArea <- tot_areas %>%
distinct(ID, population_TAZ_AA)
TAZ_plot_FINAL <- TAZ_plot %>%
left_join(TAZ_pop_TotArea)Joining, by = "ID"
mapview(TAZ_plot_FINAL, zcol = "population_TAZ_AA")# Pct Difference between the two
diff <- TAZ_pop_TotArea %>%
left_join(TAZ_pop_LandArea) %>%
mutate(pct_diff = population_TAZ_AA / population_TAZ_LA )Joining, by = "ID"
ggplot(diff, aes(x = pct_diff)) +
geom_histogram(binwidth = 0.25) +
scale_x_continuous(breaks = seq(0,25,1))hist(diff$pct_diff, 80)Dasymetric Allocation
Join the Roofs, Land Area, All Area Methods
# Join them all down here for exploration.
joined_estimates <- TAZ_pop_rooftops %>%
left_join(TAZ_pop_LandArea) %>%
left_join((TAZ_pop_TotArea))Joining, by = "ID"
Joining, by = "ID"
# Import the census block group pops once, then mutiply the populations here.